Use cases



  1. Scraping wine prices for the Agricultural Price Indices (API)




  1. NACE-Classification in the course of the NACE-Revision

Scraping wine prices

  • Aim: collect wine prices from websites of Austrian winemakers

  • Use: used as proxy for the agricultural producer price index

  • Rough outline of actions:

    1. Start with units from Statistical Business Register who run a vineyard
    2. Search their URLs and look for Webshop
    3. Scrape price list of wines
    4. Estimate average price for differente wines

Scraping wine prices (2)


Intial thoughts/issues:

  • Can we find suitable websites for many vineyards?


  • Are prices really listed on the website and in what form?


  • Might only need to find URLs (and prices) for a sample of vintners?

NACE Classification

  • NACE classification using text from enterprise website was already thoroughly investigated during WIN (WP3/UC5)


  • Massive work load on re-classification with the NACE revision

  • Re-classification of specific NACE-Group


NACE2008


G 47 ~ retail trade and specifically online shops
Contains 40 000 - 50 000 enterprises

NACE2025
G 47.11 to G 47.83
Retail Sale for food, textiles, IT equipment , …
About 37 5-digit codes

NACE Classification



  • Rough outline of actions:

    1. Start with units from Statistical Business Register
    2. Search their URLs and look for Webshop
    3. Scrape products on webshops
    4. Classify products on webshops - possibly with the help of COICOP?
    5. Derive a rule or model to classify new NACE 2025

NACE Classification



- Intial thoughts/issues:


  • Possibility to extensively scrape a webshop? (might get blocked)
  • Is it feasable to classify products listed on the web
    • Cosine Distance from word embeddings will suffice?
  • How much do you need to scrape from single website?





Thanks for listening!

Questions?